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Abstract 

A family of regularization functionals is said to admit a linear repre- 
senter theorem if every member of the family admits minimizers that lie in 
a fixed finite dimensional subspace. A recent characterization states that 
a general class of regularization functionals with differentiable regularizer 
admits a linear representer theorem if and only if the regularization term 
is a non-decreasing function of the norm. In this report, we improve over 
such result by replacing the differentiability assumption with lower semi- 
continuity and deriving a proof that is independent of the dimensionality 
of the space. 

1 Introduction 

Tikhonov regularization 13 is a popular and well-studied methodology to ad- 
dress ill-posed estimation problems |15) , and learning from examples |4 . In this 
report, we focus on regularization problems defined over a real Hilbert space 
T-L. A Hilbert space is a vector space endowed with a inner product and a norm 
that is complet£0- Such setting is general enough to take into account a broad 
family of finite-dimensional regularization techniques such as regularized least 
squares or support vector machines for classification or regression, kernel prin- 
cipal component analysis, as well as a variety of regularization problems defined 
over infinite-dimensional reproducing kernel Hilbert spaces (RKHS). 

In general, we study the problem of minimizing an extended real-valued 
functional J:H->MU {+00} of the form 

J(w) = f(L 1 w,...,L e w)+CL(w), (1) 

where L%, . . . , Li are bounded (continuous) linear functionals on T-L. The func- 
tional J is the sum of an error term /, which typically depends on empirical 



1 Meaning that Cauchy sequences are convergent. 
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data, and a regularizer that enforces certain desirable properties on the solu- 
tion. By allowing the functional J to take the value +00, problems with hard 
constraints on the values Liw are included in the framework. 

In machine learning, the most common class of regularization problems con- 
cerns a situation where a set of data pairs {xi,yi) is available, H is a space 
of real-valued functions, and the objective functional to be minimized is of the 
form 

J(w) = c((xi,j/i, w(xi)), ■ ■ ■ ,(x e ,y £ ,w(xi)) +Cl(w). 

It is easy to see that this setting is a particular case of ([T]). Indeed, the depen- 
dence on the data pairs (xi,yi) can be absorbed into the definition of /, and 
Li are point-wise evaluation functionals, i.e. such that Liiu — w(xi). Several 
popular techniques can be cast in such regularization framework. 

Example 1 (Regularized least squares). Also known as ridge regression when 
% is finite- dimensional. Corresponds to the choice 

e 

c((x 1 ,y 1 ,w(x 1 )), ■■■ , (xi,ye,w(x e )) =J^2(yi - w(xi)) 2 , 

i=i 

and fl(w) — ||w|| 2 , where the complexity parameter 7 > controls the trade-off 
between fitting of training data and regularity of the solution. 

Example 2 (Support vector machine). Given binary labels = ±1, the SVM 
classifier can be interpreted as a regularization method corresponding to the 
choice 

1 

c((x 1 ,yi,w(xi)), ■ ■ ■ ) (xi,yi,w(xt)) = 7 2J max{0, 1 -yiw{x t )}, 

i=l 

and Q(w) = ||w|| 2 . The hard-margin SVM can be recovered by letting 7 — > +00. 

Example 3 (Kernel principal component analysis). Kernel PC A can be shown 
to be equivalent to a regularization problem where 

c ((*!, yj, w(xi)), • • • , (x t , yi, w(x e j) = \ °< 1 £ti ~ 1 = 

{ +00, otherwise 

and Q is any strictly monotonically increasing function of the norm \\w\\ Jllf - 
In this problem, there are no labels yi, but the feature extractor function w is 
constrained to produce vectors with unitary empirical variance. 

Within the formulation ([1} , the possibility of using general continuous linear 
functionals Li allows to consider a much broader class of regularization prob- 
lems. 
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Example 4 (Tikhonov deconvolution) . Given a input signal u, assume that the 
convolution u * w is well-defined for any w G T~L, and the point-wise evaluated 
convolution functionals 

LiW — (u * w)(xi) — / u(s)w(xi — s)ds, 
Jx 

are continuous. A possible way to recover w from noisy measurements yi of the 
"output signal" is to solve regularization problems such as 

mi S ~ (u*w)(x l )) 2 + \\w\\ 2 , 

\ i=l / 

where the objective functional is of the form (QJ). 

Example 5 (Learning from probability measures). In many classical learning 
problems, it is appropriate to represent input training data as probability distri- 
butions instead of single points. Given a finite set of probability measures Pj on 
a measurable space (X,A), where A is a a -algebra of subsets of X, introduce 
the expectations 



LiW = E\ 

I x 



(w) = / w(x)dF t {x). 
J x 



Then, given output labels yi, one can learn a input-output relationship by solving 
regularization problems of the form 

min (c((yi,E ¥l (w)), ■■■ , (y t ,Ep t (w)) + \\w\\ 2 ) . 

If the expectations are bounded linear functionals, such regularization functional 
is of the form 

Example 6 (Ivanov regularization). By allowing the regularizer to take the 
value +oo, we can also take into account the whole class of Ivanov-type regular- 
ization problems of the form 

min f(Liw, . . . ,Liw), subject to 4>{w) < 1, 
weti 

by reformulating them as the minimization of a functional of the type (QP, where 

%) = (!: It- 1 ■ 

! +oo, otherwise 

Let's now go back to the general formulation (JXJ) . By the Riesz representation 
theorem [8l [5] , J can be rewritten as 

J(w) = f((w,wi), (w,wt)) + Sl(w), 

where Wi is the representer of the linear functional Li with respect to the inner 
product. Consider the following definition. 
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Definition 1. A family T of regularization functionals of the form (QJ) is said 
to admit a linear representer theorem if, for any J G T , and any choice of 
bounded linear functionals Li, there exists a minimizer w* that can be written 
as a linear combination of the representers: 

i 

W* — ^2 C i W i- 

i=l 

If a linear representer theorem holds, the regularization problem boils down 
to a ^-dimensional optimization problem on the scalar coefficients Cj . This prop- 
erty is important in practice, since it allows to employ numerical optimization 
techniques to compute a solution, independently of the dimension of %. Suffi- 
cient conditions under which a family of functionals admits a representer the- 
orem have been widely studied in the literature of statistics, inverse problems, 
and machine learning. The theorem also provides the foundations of learning 
techniques such as regularized kernel methods and support vector machines, see 
[HE1 HOI [T2"] and references therein. 

Representer theorems are of particular interest when H is a reproducing 
kernel Hilbert space (RKHS) [2]. Given a non-empty set X, a RKHS is a space 
of functions w : X — > R such that point- wise evaluation functionals are bounded, 
namely, for any x € X, there exists a non- negative real number C x such that 

\w{x)\ < C x \\w\\, VweH. 

It can be shown that a RKHS can be uniquely associated to a positive-semidefinite 
kernel function K : X x X — > R (called reproducing kernel), such that so-called 
reproducing property holds: 

w(x) = (w,K x ), V(x, w) G X x H, 

where the kernel sections K x are defined as 

K x {y)=K{x,y), Vy £ X. 

The reproducing property states that the representers of point-wise evaluation 
functionals coincide with the kernel sections. Starting from the reproducing 
property, it is also easy to show that the representer of any bounded linear 
functional L is given by a function Kl £ % such that 

K L (x)=LK x , VxeX. 

Therefore, in a RKHS, the representer of any bounded linear functional can be 
obtained explicitly in terms of the reproducing kernel. 

If the regularization functional ([1]) admits minimizers, and the regularizer 
is a nondecreasing function of the norm, i.e. 

n(w) = h(\\w\\), with /i:M->iU {+oo}, nondecreasing, (2) 
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the linear representer theorem follows easily from the Pythagorean identity. A 
proof that the condition ([2]) is sufficient appeared in [9] in the case where TL is a 
RKHS and Li are point-wise evaluation functionals. Earlier instances of repre- 
senter theorems can be found in [51 [3J [7] . More recently, the question of whether 
condition ^ is also necessary for the existence of linear representer theorems 
has been investigated [Tj. In particular, [I] shows that, if f2 is differentiable (and 
certain technical existence conditions hold), then ([2]) is necessary and sufficient. 
The proof of [T] heavily exploits differentiability of fi, but the authors conjec- 
ture that the hypothesis can be relaxed. In this report, we show that © is 
necessary and sufficient for the family of regularization functionals of the form 
(TTJ) to admit a linear representer theorem, by merely assuming that fi is lower 
semicontinuous and satisfies basic conditions for the existence of minimizers. 
The proof is based on a characterization of radial nondecreasing functionals on 
a Hilbert space. 

2 A characterization of radial nondecreasing func- 
tionals 

In this section, we present a characterization of radial nondecreasing functionals 
defined over Hilbert spaces. We will make use of the following definition. 

Definition 2. A subset S of a Hilbert space % is called star-shaped with respect 
to a point z € H if 

(I - X)z + Xx e S, VxeS, VA g [0, 1]. 

It is easy to verify that a convex set is star-shaped with respect to any point 
of the set, whereas a star-shaped set does not have to be convex. 

The following Theorem provides a geometric characterization of radial non- 
decreasing functions defined on a Hilbert space that generalizes the analogous 
result of [1] for differentiable functions. 

Theorem 1. Let H denote a Hilbert space such that dimH. > 2, and let : 
% — > K U {+00} a lower semicontinuous function. Then, (0) holds if and only 
if 

n(x + y) >max{n(x),n(y)}, Mx, y G H : (x, y) = 0. (3) 

Proof. Assume that ^ holds. Then, for any pair of orthogonal vectors x, y 6 H, 
we have 

n(x + y) = h (\\x + y\\) = h (VlMI 2 + ll2/ll 2 ) > maxjft (\\x\\) , h (\\y\\)} 
— max{17(x), 

Conversely, assume that condition §3§ holds. Since dim'H > 2, by fixing a 
generic vector x G X \ {0} and a number A G [0, 1], there exists a vector y such 
that IIj/II = 1 and 

A = 1 - cos 2 9, 
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where 

a ( x >y) 

cos9 = ¥M\- 

In view of (j3|), we have 

Sl(x) = Sl(x - (x, y)y + (x, y)y) 

> Sl(x — (x, y)y) — il(^x — cos 2 9x + cos 2 Ox — (x, y)y) 

> n (\x) . 

Since the last inequality trivially holds also when x = 0, we conclude that 

ft(x) > Q(Xx), Vx e%, VA g [0, 1], (4) 

so that f2 is non-decreasing along all the rays passing through the origin. In 
particular, the minimum of f2 is attained at x = 0. 
Now, for any c > O(0) , consider the sublevel sets 

S c = {x G H : Q(x) < c}. 

From |4|), it follows that S c is not empty and star-shaped with respect to the 
origin. In addition, since D, is lower semi-continuous, S c is also closed. We now 
show that S c is either a closed ball centered at the origin, or the whole space. 
To this end, we show that, for any x e S c , the whole ball 

B = {yeH:\\y\\ < \\x\\}, 

is contained in S c . First, take any y £ int(S) \ spanjx}, where int denotes the 
interior. Then, y has norm strictly less than ||x||, that is 

o< IMI < IN, 

and is not aligned with x, i.e. 

y ^ Xx, VA € R. 

Let S £ 1 denote the angle between x and y. Now, construct a sequence of 
points Xk as follows: 

xo = y, 

Xk+i = x k + a k u k , 
where 

ak = \\xk\\ tan f^j , n G N 

and Uk is the unique unitary vector that is orthogonal to Xk, belongs to the 
two-dimensional subspace span{x,y}, and is such that (uk,x) > 0, that is 

Uk G span{x, y}, \\u k \\ = 1, (u k ,x k }=0, (u k ,x) > 0. 
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By orthogonality, we have 

fc+i 



\\xk+i\r = \\x k \\ 2 + ai = \\x k \\ z (^1 + tan 2 ^-jj = \\y\\ 2 [ 1 + tan 
In addition, the angle between xt+i and x k is given by 



(5) 



9k = arctan . 

jFfcIL 

so that the total angle between y and x n is given by 

n-1 



Jk = o. 

Since all the points x k belong to the subspace spanned by x and y, and the 
angle between x and x n is zero, we have that x n is positively aligned with x, 
that is 

x n = Ax, A > 0. 

Now, we show that n can be chosen in such a way that A < 1. Indeed, from ([5]) 
we have 



Ml 

and it can be verified that 

lim ( 1 + t&n 2 ( =1, 

n-> + oo ^ \Tl J J 

therefore A < 1 for a sufficiently large n. Now, write the difference vector in the 
form 

n-1 

\x - y = ^(xfc+i - x k ), 

k=0 

and observe that 

(x k+ i - x k ,x k ) = 0. 
By using (|4]) and proceeding by induction, we have 

c > Cl(\x) =Q(x n - x n -i + x n -i) > Q(x n -x) > ■■■> SI(xq) = 

so that y 6 S c . Since S c is closed and the closure of int(B) \ span{x} is the 
whole ball B, every point y G B is also included in S c . This proves that S c is 
either a closed ball centered at the origin, or the whole space H. 

Finally, for any pair of points such that ||x|| = ||y||, we have x S Sq( v ), and 
V S Sq,{ x ), so that 

Q(x) = n(y). 

□ 
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3 Representer theorem: a necessary and suffi- 
cient condition 

In this section, we prove that condition is necessary and sufficient for suitable 
families of regularization functionals of the type ([1} to admit a linear representer 
theorem. 

Theorem 2. Let % denote a Hilbert space such that dimH > 2. Let T denote a 
family of functionals J : 7i —> RU{+oo} of the form that admit minimizers. 

1. Lf Q satisfy HP, then J- admits a linear representer theorem. 

2. Conversely, assume that T contains a set of functionals of the form 

j;{w)^ 1 f({w lP )) + n(w), VpeH, V 7 gR+, (6) 

where f(z) is uniquely minimized at z — 1. For any lower- semicontinuous 
Q, the family J- admits a linear representer theorem only if holds. 

Proof. The first part of the theorem (sufficiency) follows from an orthogonality 
argument. Take any functional J G T. Let 1Z = spanjwi, . . . ,u>i} and let 1Z 1 - 
denotc its orthogonal complement. Any minimizer w* of J can be uniquely 
decomposed as 

w* = u + v, uGTZ, v 6 1Z . 
If @ holds, then we have 

J(w*) - J{u) = h{\\w*\\) - h(\\u\\) > 0, 

so that u G 7Z is also a minimizer. 

Now, let's prove the second part of the theorem. First of all, observe that 
the functional 

j 7 H = 7 /(o) + n(«;), 

obtained by setting p = in ©, belongs to J- . By hypothesis, Jg admits mini- 
mizers. In addition, by the representer theorem, the only admissible minimizer 
of Jo is the origin, that is 

Q(y) > 0(0), G H. (7) 

Now take any x G T-L \ {0} and let 
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By the representer theorem, the functional of the form ([6]) admits a 
minimizer of the type 

w = \("i)x. 
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Now, take any y 6 % such that (x,y) = 0. By using the fact that f(z) is 
minimized at z = 1, and the linear representer theorem, we have 

7/(l)+« (A(7)ar) < 7/(A( 7 ))+^ (A( 7 )ar) = j;{Kl» < J p 7 (^+2/) - 7/(1)+^ (x + y). 
By combining this last inequality with ([7]), we conclude that 

fl (x + y) > (A( 7 )x) , Vx,yGH: (x,y) = 0, V 7 e R+. (8) 
Now, there are two cases: 

• Ct(x + y) = +oo 

• Q (x + y) = C < +oo. 

In the first case, we trivially have 

Q{x + y)> Q(x). 
In the second case, using (J?} and (JU), we obtain 

< 7 (/(A( 7 )) - /(I)) < n (x + y)-fl (A( 7 )ar) < <7-fi(0) < +oo, V 7 € E+. 

(9) 

Let 7,t denote a sequence such that linifc^ +00 7^ = +00, and consider the se- 
quence 

o* = 7fc(/(A(7fc))-/(l))- 

From ([5]), it follows that is bounded. Since z = 1 is the only minimizer of 
f(z), the sequence can remain bounded only if 

lim A (7*) = 1. 

By taking the limit inferior in (|SJ) for 7 — > +00, and using the fact that Q is 
lower semicontinuous, we obtain condition ([3]). It follows that ft satisfies the 
hypotheses of Theorem [TJ therefore @ holds. □ 

The second part of Theorem [2] states that any lower-semicontinuous regu- 
larizer fi has to be of the form ^ in order for the family T to admit a linear 
representer theorem. Observe that is not required to be differentiable or even 
continuous. Moreover, it needs not to have bounded lower level sets. For the 
necessary condition to holds, the family JF has to be broad enough to contain 
at least a set of regularization functionals of the form ([6]). The following ex- 
amples show how to apply the necessary condition of Theorem [2] to classes of 
regularization problems with standard loss functions. 

• Let L : I 2 -> 1 U {+°°} denote any loss function of the type 

L(y,z) = L(y - z), 
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such that L(t) is uniquely minimized at t = 0. Then, for any lower- 
semicontinuous regularizer fi, the family of regularization functionals of 
the form 

t 

J{w) =7^L(y i5 (w,Wi)) + n(w), 

i=l 

admits a linear representer theorem if and only if @ holds. To see that the 
hypotheses of Theorem [5] are satisfied, it is sufficient to consider the subset 
of functionals with t = 1, y\ = 1, and w± = p € %■ These functionals can 
be written in the form ([6]) with 

f(z) = L(l,z). 

• The class of regularization problems with the hinge (SVM) loss of the form 

i 

J(w) = 7^max{0, 1 - y l {w,w l )} + H(w), 
»=i 

with fl lower-semicontinuous, admits a linear representer theorem if and 
only if f2 satisfy ©. For instance, by choosing I = 2, and 

(yi, m) = [V2,w 2 ) = (-l,p/2), 

we obtain regularization functionals of the form ([6]) with 

/O) = max{0, 1 - z} + max{0, 1 + z/2}, 
and it is easy to verify that / is uniquely minimized at z = 1. 

4 Conclusions 

We have shown that some general families of regularization functionals defined 
over a Hilbert space with lower semicontinuous regularizer admits a linear rep- 
resenter theorem if and only if the regularizer is a radial nondecreasing function. 
The result extends a previous characterization of [T] , by relaxing the assumptions 
on the regularization term. We provide a unified proof that holds simultaneously 
for the finite and the infinite dimensional case. 
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